Before we begin, start a new file with File \(\rightarrow\) New File \(\rightarrow\) R Script. As you work through this sheet in the console in R, also add (copy/paste) your commands that work into this new file. At the end, save it, and run to execute all of your commands at once.
gapminder that uses a small snippet of this data for exploratory analysis. Install and load the package gapminder. Type ?gapminder and hit enter to see a description of the data.# first time only
# install.packages("gapminder")
# load gapminder
library(gapminder)
# get help
?gapmindergapminder to see what we’re dealing with.structure of the gapminder data.str(gapminder)## tibble [1,704 × 6] (S3: tbl_df/tbl/data.frame)
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num [1:1704] 28.8 30.3 32 34 36.1 ...
## $ pop : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
# - country: a factor
# - continent: a factor
# - year: an integer
# - lifeExp: a number
# - gdpPercap: a numberhead of the dataset to get an idea of what the data looks like.head(gapminder)summary statistics of all variables.summary(gapminder)## country continent year lifeExp
## Afghanistan: 12 Africa :624 Min. :1952 Min. :23.60
## Albania : 12 Americas:300 1st Qu.:1966 1st Qu.:48.20
## Algeria : 12 Asia :396 Median :1980 Median :60.71
## Angola : 12 Europe :360 Mean :1980 Mean :59.47
## Argentina : 12 Oceania : 24 3rd Qu.:1993 3rd Qu.:70.85
## Australia : 12 Max. :2007 Max. :82.60
## (Other) :1632
## pop gdpPercap
## Min. :6.001e+04 Min. : 241.2
## 1st Qu.:2.794e+06 1st Qu.: 1202.1
## Median :7.024e+06 Median : 3531.8
## Mean :2.960e+07 Mean : 7215.3
## 3rd Qu.:1.959e+07 3rd Qu.: 9325.5
## Max. :1.319e+09 Max. :113523.1
##
gg. Use base R’s hist() function to plot a histogram of gdpPercap.hist(gapminder$gdpPercap)R’s boxplot() function to plot a boxplot of gdpPercap.boxplot(gapminder$gdpPercap)continent.1boxplot(gapminder$gdpPercap~gapminder$continent)# alternate method
# boxplot(gdpPercap~continent, data = gapminder)gdpPercap on the \(x\)-axis and LifeExp on the \(y\)-axis.plot(gapminder$lifeExp~gapminder$gdpPercap)# alternate method
# boxplot(lifeExp~gdpPercap, data = gapminder)ggplot2ggplot2 (you should have installed it previously. If not, install first with install.packages("ggplot2")).# install if you don't have
# install.packages("ggplot2")
# load ggplot2
library(ggplot2)bar graph to see how many countries are in each continent. The only aesthetic you need is to map continent to x. Bar graphs are great for representing categories, but not quantitative data.ggplot(data = gapminder,
aes(x = continent))+
geom_bar()histogram to visualize the distribution of a variable. Make a histogram of gdpPercap. Your only aesthetic here is to map gdpPercap to x.ggplot(data = gapminder,
aes(x = gdpPercap))+
geom_histogram()## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
aesthetic that maps continent to fill.2ggplot(data = gapminder,
aes(x = gdpPercap,
fill = continent))+
geom_histogram()## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
histogram, change the geom to make it a density graph. To avoid overplotting, add alpha=0.4 to the geom argument (alpha changes the transparency of a fill).ggplot(data = gapminder,
aes(x = gdpPercap,
fill = continent))+
geom_density(alpha=0.4)lifeExp instead of gdpPercap.ggplot(data = gapminder,
aes(x = lifeExp,
fill = continent))+
geom_density(alpha=0.4)lifeExp (as y) on gdpPercap (as x). You’ll need both for aesthetics. The geom here is geom_point().ggplot(data = gapminder,
aes(x = gdpPercap,
y = lifeExp))+
geom_point()continent to color in your aesthetics.ggplot(data = gapminder,
aes(x = gdpPercap,
y = lifeExp,
color = continent))+
geom_point()geom_smooth(). Add this layer on top of your geom_point() layer.ggplot(data = gapminder,
aes(x = gdpPercap,
y = lifeExp,
color = continent))+
geom_point()+
geom_smooth()## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
global aesthetic of mapping continent to color. If we want just one regression line, we need to instead move the color = continent inside the aes of geom_point. This will only map continent to color for points, not for anything else.ggplot(data = gapminder,
aes(x = gdpPercap,
y = lifeExp))+
geom_point(aes(color = continent))+
geom_smooth()## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
aesthetic to your points to map pop to size.ggplot(data = gapminder,
aes(x = gdpPercap,
y = lifeExp))+
geom_point(aes(color = continent,
size = pop))+
geom_smooth()## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
"black". Try first by putting this inside an aes() in your geom_smooth, and try a second time by just putting it inside geom_smooth without an aes(). What’s the difference, and why?ggplot(data = gapminder,
aes(x = gdpPercap,
y = lifeExp))+
geom_point(aes(color = continent,
size = pop))+
geom_smooth(aes(color = "black"))## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
# putting it inside aesthetics tries to map color to something
# in the da ta called "black", since R can't find "black",
# it will produce some random color
ggplot(data = gapminder,
aes(x = gdpPercap,
y = lifeExp))+
geom_point(aes(color = continent,
size = pop))+
geom_smooth(color = "black")## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
# putting it outside aesthetics (correctly) sets color to blackfaceting. Add +facet_wrap(~continent) to create subplots by continent.ggplot(data = gapminder,
aes(x = gdpPercap,
y = lifeExp))+
geom_point(aes(color = continent,
size = pop))+
geom_smooth(color = "black")+
facet_wrap(~continent)## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
facet layer. The scale is quite annoying for the x-axis, a lot of points are clustered on the lower level. Let’s try changing the scale by adding a layer: +scale_x_log10().ggplot(data = gapminder,
aes(x = gdpPercap,
y = lifeExp))+
geom_point(aes(color = continent,
size = pop))+
geom_smooth(color="black")+
scale_x_log10()## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
+labs(). Inside labs, make proper axes titles for x, y, and a title to the plot. If you want to change the name of the legends (continent color), add one for color and size.ggplot(data = gapminder,
aes(x = gdpPercap,
y = lifeExp))+
geom_point(aes(color = continent,
size = pop))+
geom_smooth(color="black")+
scale_x_log10()+
labs(x = "GDP per Capita",
y = "Life Expectancy",
color = "Continent",
size = "Population")## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
gapminder dataframe and subset it to only look at continent=="Americas"). Assign this to a new dataframe object (call it something like america.) Now, use this as your data, and redo the graph from question 17. (You might want to take a look at your new dataframe to make sure it worked first!)america<-gapminder[gapminder$continent=="Americas",]
# verify this worked
americaggplot(data = america,
aes(x = gdpPercap,
y = lifeExp))+
geom_point(aes(color = continent,
size = pop))+
geom_smooth()## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
gap_2002<-gapminder[gapminder$year==2002,]
# verify this worked
gap_2002ggplot(data = gap_2002,
aes(x = gdpPercap,
y = lifeExp))+
geom_point(aes(color = continent,
size = pop))+
geom_smooth()## `geom_smooth()` using method = 'loess' and formula 'y ~ x'